Case Study: Visit with us | Travel Package Purchase Prediction

*Using Ensemble Techniques*


Background & Context

Objective

Data Dictionary -

Customer details:

Customer interaction data:


Loading libraries


Import Dataset

Observation:

Observation:


Overview of the data

Observation:

We have an imbalanced data set, with 81.2% of customers that didn't buy a travel package and only 18.8% that bought it.

Observation:


Summary of the dataset


PreferredPropertyStar, PitchSatisfactionScore are Ordinal Categorical we gonna keep it as numerical and procide with Label Encoding considering that there is a sense of order on the values.

Passaport, OwnCarare Binary Categorical, we gonna keep it as numerical and procide with Label Encoding.


Feature Engineering

Fixing Gender category

Dropping Customer ID

Checking Duration Of Pitch extreme values

We gonna remove the number 1 and keep 27 and 26 respectivily.

Checking Number of Trips extreme values

We not gonna treat this informations as outliers. .

Checking Low Monthly Income

Considering that the min for Large Business is 16091.0, we gonna replace the other 2 values with the min for this category.

Checking High Monthly Income


EDA

Univariate analysis

Observations on Age

Observations on Duration Of Pitch

Observations on Number Of Person Visiting

Observations on Number Of Follow ups

Observations on Monthly Income

Observations on ProdTaken (target variable)

Observations on CityTier

Observations on Preferred Property Star

Observations on Number Of Trips

Observations on Passport

Observations on Pitch Satisfaction Score

Observations on Own Car

Observations on Number Of Children Visiting

Observations on Type of Contact

Observations on Occupation

Observations on Gender

Observations on Product Pitched

Observations on Marital Status

Observations on Designation

Bivariate Analysis

Observations:

  1. Age vs ProdTaken -> Younger Customer tend to buy travel package.
  2. DurationOfPitch vs ProdTaken -> Duration of pitch mean is greather for customer who buy travel package.
  3. NumberOfTrips vs ProdTaken -> Does not seems to make the difference between customer who buy or not the travel package.
  4. MonthlyIncome vs ProdTaken -> Customer with lower income tend to buy travel package (Correlated with age)

Numerical variables Vs ProdTaken

Categorical variables Vs ProdTaken

Dummy variables

Observations:

  1. TypeOfContact vs ProdTaken -> Customer who Self Enquiry is more likehood to buy the travel package.
  2. Occupation vs ProdTaken -> Small Bussines and Salaried is more likehood to buy the travel package.
  3. Gender vs ProdTaken -> We have more Male customer looking for travel package and buying it.
  4. ProductPitched vs ProdTaken -> Basic is the most comum package buy followed by Delux.
  5. MaritalStatus vs ProdTaken -> Silgle and Married are the customer who usually buy the travel package.
  6. Designation vs ProdTaken -> Executives are more likehood to buy a travel package followed by Managers.

Categorical Ordinal variables Vs ProdTaken

Label Encoding

Observations:

  1. CityTier vs ProdTaken -> Customer from developed city is more likehood to buy travel package
  2. NumberOfPersonVisiting vs ProdTaken -> Usually 3 is the number of person visiting, followed by 2.
  3. NumberOfFollowups vs ProdTaken -> Usually customer have 4 followups to decided if they want to or not.
  4. PreferredPropertyStar vs ProdTaken -> Customer preffers 3 stars Property follow by 5 Stars.
  5. PitchSatisfactionScore vs ProdTaken -> 3 Score is the most comum Stisfaction Score, from customer who bought or not our product.
  6. NumberOfChildrenVisiting vs ProdTaken -> Usually customer will take 1 Children, followed by 2 for customer who bought or not our travel package.

Binary variables Vs ProdTaken

Observations:

  1. Passport vs ProdTaken -> Customer with passport bought more travel package, the total of customer that have passport is less than without passport.
  2. OwnCar vs ProdTaken -> 62% of customer has a car and from this 62%, only 18.5% bought travel package. On the other hand, 38% of customer does not have a car and from those, 19.4% bought a travel package. So OwnCar, does not seem to be a important feature.

Multivariate Analysis

Observations:

30% of customer has a duration of pitch between 5 and 10, this is the time our team has to convince the customer about buying our package or make them listening us for more time.

Observations:

Observation:

We can see a pattern here.

Observation:

Passport

Customer with or without passaport seems to have almost the same behavier considering Monthly Income, Age, Number of Folowwups, Number Of Person Visiting and Number of Trips. Further we'll check the behavier only for customer who bought the travel package.

NumeberOfTrip

Monthly Income vs Product Pitched

We can see a patter here where customers with:

Age vs Product Pitched

Here we can see clearly the correlation between Age and Product Pitched. Younger the customer basic is the package.

Age vs MonthlyIncome

Here we can see clearly the correlation between Age and Monthly Income. How it was expected, older the customer, higher usually is the Income.

Putting all togheter, We can see kind of a pattern between Age, Income and ProdPitched.

Designation per Monthly Income by ProdTaken

NumberOfChildrenVisiting per Monthly Income and MaritalStatus

TypeofContact


Customer Profile

Number Of Followups Customers who DID buy the package

Number Of Followups Customers who DID NOT buy the package

Duration Of Pitch Customers who DID buy the package

Duration Of Pitch Customers who DID NOT buy the package


Missing Values


We'll have to see if there is a pattern on the missing values, by groupby some informations.


DurationOfPitch

MonthlyIncome

Age

Number Of Trips

Number Of Children Visiting

Number Of Followups

Prefer Property Star

Type Of Contact

As checked on EDA, TypeOfContact has no pattern with others variables. Considering that 70.5% of customer Self Enquiry and 29.5% Company Invited, we gonna consider that this 25 customer (0.5% of dataset) belongs to Self Enquiry.

Let's find the percentage of outliers, in each column of the data, using IQR


Building the Model

  1. Data preparation
  2. Partition the data into train and test set.
  3. Build model on the train data.
  4. Tune the model if required.
  5. Test the data on test set.

We can see that train and test kept the proportion of the target classes (0 ~ 81% ; 1 ~ 19%)

Model evaluation criterion

Model can make wrong predictions as:

  1. Predicting an customer will buy and the customer doesn't buy the travel package.
  2. Predicting an customer will NOT buy and the customer buy it.

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

DECISION TREE BAGGING AND RANDOM FOREST

Let's define function to provide metric scores(accuracy,recall and precision) on train and test set and a function to show confusion matrix so that we do not have use the same code repetitively while evaluating models.

Decision Tree Model

Bagging Classifier Model

Bagging Classifier with default decision tree

Bagging Classifier with weighted and decision tree base estimator

Random Forest Model

Random Forest with default decision tree

Random Forest with class weights decision tree

Tuning Models

Using GridSearch for Hyperparameter tuning model

Tuning Decision Tree

Tuning Bagging Classifier

Tuning Random Forest

Comparing all Baggin models

Conclusion:

All model are performing poorly and/or overfitting.

Feature importance of tunned Random Forest

BOOSTING

AdaBoost Classifier - default

Gradient Boosting Classifier

XGBoost Classifier

Considering that XBBoost treats missing values, we gonna check the performance of the model with the data before missing values treatment

Hyperparameter Tuning

AdaBoost Classifier

Gradient Boosting Classifier

using AdaBoost classifier as the estimator for initial predictions

As compared to the model with default parameters:

Gradient Boosting Tuned

XGBoost Classifier - tuned

XGB Tuned 2 - different parameters

XGB tuned without missing values treatment

XGB tuned 2 - without missing values treatment

Stacking Classifier

Conclusion:

Conclusion

XGB2_2 - Tunned other parameters (No treatment missing values) is our best model, it reduced the overfitting and is performing better compare to the others models. Performance on Precision and F1_score is not performing that well. It still need improvement considering that is performing 0.826 on test recall and 0.595 on precission.

Business Recommendations

Recommendations